Internet Search Engine Freshness by Web Server Help
نویسندگان
چکیده
We study how to keep the Internet search engines up-todate with the changes occurring at the various web servers in the Internet. Currently, web search engines poll the web servers on a per-URL basis for obtaining update information. We advocate an approach in which web servers themselves track the changes happening to their content files for propagating updates to search engines. We propose an algorithm which uses both freshness and popularity of data at the web servers for deciding the discrepancy between a web site and a search engine. This algorithm batches the push of updates from the web server to the search engine. We prove that this algorithm is competitive with an optimal algorithm.
منابع مشابه
Workload-Aware Web Crawling and Server Workload Detection
With the development of search engines, more and more web crawlers are used to gather web pages. The rising crawling traffic has brought the concern that crawlers may impact web sites. On the other hand, more efficient crawling strategy is required for the coverage and freshness of search engine index. In this paper, crawlers of several major search engines are analyzed using one six-months acc...
متن کاملAn Approach to Design Incremental Parallel Webcrawler
World Wide Web (WWW) is a huge repository of interlinked hypertext documents known as web pages. Users access these hypertext documents via Internet. Since its inception in 1990, WWW has become many folds in size, and now it contains more than 50 billion publicly accessible web documents distributed all over the world on thousands of web servers and still growing at exponential rate. It is very...
متن کاملReducing Network Traffic and Managing Volatile Web Contents Using Migrating Crawlers with Table of Variable Information
As the size of the web continues to grow, searching it for useful information has become increasingly difficult. Also study reports that sufficient of current internet traffic and bandwidth consumption are due to the web crawlers that retrieve pages for indexing by the different search engines. Moreover, due to the dynamic nature of the web, it becomes very difficult for a search engine to prov...
متن کاملImproving the Information Retrieval in the World Wide Web
In this paper we expose a visualization system suitable to be installed on any Internet search engine or directory. This system is based on a new user interface, which makes more comfortable the user’s search and the navigation through the results. This new interface consists on just one window and all the Web pages selected by users are downloaded in background, without disturbing the user int...
متن کاملUse of Fuzzy C-Means Algorithm for Web Proxy Server Performance Improvement
Now a days the web is loaded with lot of request from users and it creates a lot of traffic on the web. As the requests are increasing the resources in the World Wide Web are also rising to large extent. In addition the services and applications provided by the web are directly proportional to its growth. For this reason, web traffic is huge, and to gain access to these resources incurs user-pe...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001